Creating new pipeline using seurat v4.0.2 available 2021.06.23
Important notes:
percent.mt, but NOT regressing on percent.mtnCounts_RNA and nFeature_RNALoad libraries required for Seuratv4
knitr::opts_knit$set(root.dir = "~/Desktop/10XGenomicsData/msAggr_scRNASeq/")
library(dplyr)
library(Seurat)
library(patchwork)
library(ggplot2)
# library(clustree)
store session info
sink("msAggr_seurat-v1.20210623")
sessionInfo()
sink()
ScaleDatahttps://bioconductor.org/packages/3.10/workflows/vignettes/simpleSingleCell/inst/doc/batch.html#62_for_gene-based_analyses >You can also normalize and scale data for the RNA assay. There are numerous resources on this, but Aaron Lun describes why the original log-normalized values should be used for DE and visualizations of expression quite well here: > >For gene-based procedures like differential expression (DE) analyses or gene network construction, it is desirable to use the original log-expression values or counts. The corrected values are only used to obtain cell-level results such as clusters or trajectories. Batch effects are handled explicitly using blocking terms or via a meta-analysis across batches. We do not use the corrected values directly in gene-based analyses, for various reasons: > >It is usually inappropriate to perform DE analyses on batch-corrected values, due to the failure to model the uncertainty of the correction. This usually results in loss of type I error control, i.e., more false positives than expected. > >The correction does not preserve the mean-variance relationship. Applications of common DE methods like edgeR or limma are unlikely to be valid. > >Batch correction may (correctly) remove biological differences between batches in the course of mapping all cells onto a common coordinate system. Returning to the uncorrected expression values provides an opportunity for detecting such differences if they are of interest. Conversely, if the batch correction made a mistake, the use of the uncorrected expression values provides an important sanity check. > >In addition, the normalized values in SCT and integrated assays don’t necessary correspond to per-gene expression values anyway, rather containing residuals (in the case of the scale.data slot for each).
Mess with how to load 4 cell populations into single seurat object
SET SEED?????!!!!!
projectName <- "msAggr"
jackstraw.dim <- 40
source("msAggr_AnalysisCode/read_10XGenomics_data.R")
source("msAggr_AnalysisCode/PercentVariance.R")
setwd("../cellRanger/") # temporarily changing wd only works if you run the entire chunk at once
Warning: The working directory was changed to /Users/heustonef/Desktop/10XGenomicsData/cellRanger inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
data_file.list <- read_10XGenomics_data(sample.list = c("LSKm2", "CMPm2", "MEPm", "GMPm"))
data.object<-Read10X(data_file.list)
seurat.object<- CreateSeuratObject(counts = data.object, min.cells = 3, min.genes = 200, project = projectName)
Clean up to free memory
remove(data.object)
Add mitochondrial metadata and plot some basic features
seurat.object[["percent.mt"]] <- PercentageFeatureSet(seurat.object, pattern = "^mt-")
VlnPlot(seurat.object, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3, pt.size = 0, fill.by = 'orig.ident', )
plot1 <- FeatureScatter(seurat.object, feature1 = "nCount_RNA", feature2 = "percent.mt", group.by = "orig.ident", pt.size = 0.01)
plot2 <- FeatureScatter(seurat.object, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", group.by = "orig.ident", pt.size = 0.01)
plot1 + plot2
remove low quality cells require: nFeature_RNA between 200 and 4000 (inclusive) require: percent.mt <= 5
print(paste("original object:", nrow(seurat.object@meta.data), "cells", sep = " "))
[1] "original object: 41950 cells"
seurat.object <- subset(seurat.object,
subset = nFeature_RNA >=200 &
nFeature_RNA <= 4000 &
percent.mt <= 5
)
print(paste("new object:", nrow(seurat.object@meta.data), "cells", sep = " "))
[1] "new object: 37104 cells"
Struggling to wrap my head around this one. It seems that SCTransform is best for batch correction, but NormalizeData and ScaleData are best for DGE. Several vignettes have performed both
`selection.method
How to choose top variable features. Choose one of :
vst: First, fits a line to the relationship of log(variance) and log(mean) using local polynomial regression (loess). Then standardizes the feature values using the observed mean and expected variance (given by the fitted line). Feature variance is then calculated on the standardized values after clipping to a maximum (see clip.max parameter).
mean.var.plot (mvp): First, uses a function to calculate average expression (mean.function) and dispersion (dispersion.function) for each feature. Next, divides features into num.bin (deafult 20) bins based on their average expression, and calculates z-scores for dispersion within each bin. The purpose of this is to identify variable features while controlling for the strong relationship between variability and average expression.
dispersion (disp): selects the genes with the highest dispersion values`
seurat.object <- NormalizeData(seurat.object, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Find variable features
seurat.object <- FindVariableFeatures(seurat.object, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
top10 <- head(VariableFeatures(seurat.object), 10)
plot1 <- VariableFeaturePlot(seurat.object)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
When using repel, set xnudge and ynudge to 0 for optimal results
plot1 + plot2
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Removed 2 rows containing missing values (geom_point).
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Removed 2 rows containing missing values (geom_point).
Scale data (linear transformation)
all.genes <- rownames(seurat.object)
seurat.object <- ScaleData(seurat.object, features = all.genes, vars.to.regress = c("nCount_RNA", "nFeature_RNA"))
Regressing out nCount_RNA, nFeature_RNA
|
| | 0%
|
|= | 0%
|
|= | 1%
|
|== | 1%
|
|=== | 1%
|
|=== | 2%
|
|==== | 2%
|
|==== | 3%
|
|===== | 3%
|
|====== | 3%
|
|====== | 4%
|
|======= | 4%
|
|======== | 4%
|
|======== | 5%
|
|========= | 5%
|
|========== | 5%
|
|========== | 6%
|
|=========== | 6%
|
|============ | 6%
|
|============ | 7%
|
|============= | 7%
|
|============= | 8%
|
|============== | 8%
|
|=============== | 8%
|
|=============== | 9%
|
|================ | 9%
|
|================= | 9%
|
|================= | 10%
|
|================== | 10%
|
|=================== | 10%
|
|=================== | 11%
|
|==================== | 11%
|
|==================== | 12%
|
|===================== | 12%
|
|====================== | 12%
|
|====================== | 13%
|
|======================= | 13%
|
|======================== | 13%
|
|======================== | 14%
|
|========================= | 14%
|
|========================== | 14%
|
|========================== | 15%
|
|=========================== | 15%
|
|============================ | 15%
|
|============================ | 16%
|
|============================= | 16%
|
|============================= | 17%
|
|============================== | 17%
|
|=============================== | 17%
|
|=============================== | 18%
|
|================================ | 18%
|
|================================= | 18%
|
|================================= | 19%
|
|================================== | 19%
|
|=================================== | 19%
|
|=================================== | 20%
|
|==================================== | 20%
|
|==================================== | 21%
|
|===================================== | 21%
|
|====================================== | 21%
|
|====================================== | 22%
|
|======================================= | 22%
|
|======================================== | 22%
|
|======================================== | 23%
|
|========================================= | 23%
|
|========================================== | 23%
|
|========================================== | 24%
|
|=========================================== | 24%
|
|============================================ | 24%
|
|============================================ | 25%
|
|============================================= | 25%
|
|============================================= | 26%
|
|============================================== | 26%
|
|=============================================== | 26%
|
|=============================================== | 27%
|
|================================================ | 27%
|
|================================================= | 27%
|
|================================================= | 28%
|
|================================================== | 28%
|
|=================================================== | 28%
|
|=================================================== | 29%
|
|==================================================== | 29%
|
|===================================================== | 29%
|
|===================================================== | 30%
|
|====================================================== | 30%
|
|====================================================== | 31%
|
|======================================================= | 31%
|
|======================================================== | 31%
|
|======================================================== | 32%
|
|========================================================= | 32%
|
|========================================================== | 32%
|
|========================================================== | 33%
|
|=========================================================== | 33%
|
|============================================================ | 33%
|
|============================================================ | 34%
|
|============================================================= | 34%
|
|============================================================= | 35%
|
|============================================================== | 35%
|
|=============================================================== | 35%
|
|=============================================================== | 36%
|
|================================================================ | 36%
|
|================================================================= | 36%
|
|================================================================= | 37%
|
|================================================================== | 37%
|
|=================================================================== | 37%
|
|=================================================================== | 38%
|
|==================================================================== | 38%
|
|===================================================================== | 38%
|
|===================================================================== | 39%
|
|====================================================================== | 39%
|
|====================================================================== | 40%
|
|======================================================================= | 40%
|
|======================================================================== | 40%
|
|======================================================================== | 41%Warning in doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation
|
|========================================================================= | 41%
|
|========================================================================== | 41%
|
|========================================================================== | 42%
|
|=========================================================================== | 42%
|
|============================================================================ | 42%
|
|============================================================================ | 43%
|
|============================================================================= | 43%
|
|============================================================================= | 44%
|
|============================================================================== | 44%
|
|=============================================================================== | 44%
|
|=============================================================================== | 45%
|
|================================================================================ | 45%
|
|================================================================================= | 45%
|
|================================================================================= | 46%
|
|================================================================================== | 46%
|
|=================================================================================== | 46%
|
|=================================================================================== | 47%
|
|==================================================================================== | 47%
|
|===================================================================================== | 47%
|
|===================================================================================== | 48%
|
|====================================================================================== | 48%
|
|====================================================================================== | 49%
|
|======================================================================================= | 49%
|
|======================================================================================== | 49%
|
|======================================================================================== | 50%
|
|========================================================================================= | 50%
|
|========================================================================================== | 50%
|
|========================================================================================== | 51%
|
|=========================================================================================== | 51%
|
|============================================================================================ | 51%
|
|============================================================================================ | 52%
|
|============================================================================================= | 52%
|
|============================================================================================= | 53%
|
|============================================================================================== | 53%
|
|=============================================================================================== | 53%
|
|=============================================================================================== | 54%
|
|================================================================================================ | 54%
|
|================================================================================================= | 54%
|
|================================================================================================= | 55%
|
|================================================================================================== | 55%
|
|=================================================================================================== | 55%
|
|=================================================================================================== | 56%
|
|==================================================================================================== | 56%
|
|===================================================================================================== | 56%
|
|===================================================================================================== | 57%
|
|====================================================================================================== | 57%
|
|====================================================================================================== | 58%
|
|======================================================================================================= | 58%
|
|======================================================================================================== | 58%
|
|======================================================================================================== | 59%
|
|========================================================================================================= | 59%
|
|========================================================================================================== | 59%
|
|========================================================================================================== | 60%
|
|=========================================================================================================== | 60%
|
|============================================================================================================ | 60%
|
|============================================================================================================ | 61%
|
|============================================================================================================= | 61%
|
|============================================================================================================= | 62%
|
|============================================================================================================== | 62%
|
|=============================================================================================================== | 62%
|
|=============================================================================================================== | 63%
|
|================================================================================================================ | 63%
|
|================================================================================================================= | 63%
|
|================================================================================================================= | 64%
|
|================================================================================================================== | 64%
|
|=================================================================================================================== | 64%
|
|=================================================================================================================== | 65%
|
|==================================================================================================================== | 65%
|
|===================================================================================================================== | 65%
|
|===================================================================================================================== | 66%
|
|====================================================================================================================== | 66%
|
|====================================================================================================================== | 67%
|
|======================================================================================================================= | 67%
|
|======================================================================================================================== | 67%
|
|======================================================================================================================== | 68%
|
|========================================================================================================================= | 68%
|
|========================================================================================================================== | 68%
|
|========================================================================================================================== | 69%
|
|=========================================================================================================================== | 69%
|
|============================================================================================================================ | 69%
|
|============================================================================================================================ | 70%
|
|============================================================================================================================= | 70%
|
|============================================================================================================================= | 71%
|
|============================================================================================================================== | 71%
|
|=============================================================================================================================== | 71%
|
|=============================================================================================================================== | 72%
|
|================================================================================================================================ | 72%
|
|================================================================================================================================= | 72%
|
|================================================================================================================================= | 73%
|
|================================================================================================================================== | 73%
|
|=================================================================================================================================== | 73%
|
|=================================================================================================================================== | 74%
|
|==================================================================================================================================== | 74%
|
|===================================================================================================================================== | 74%
|
|===================================================================================================================================== | 75%
|
|====================================================================================================================================== | 75%
|
|====================================================================================================================================== | 76%
|
|======================================================================================================================================= | 76%
|
|======================================================================================================================================== | 76%
|
|======================================================================================================================================== | 77%
|
|========================================================================================================================================= | 77%
|
|========================================================================================================================================== | 77%
|
|========================================================================================================================================== | 78%
|
|=========================================================================================================================================== | 78%
|
|============================================================================================================================================ | 78%
|
|============================================================================================================================================ | 79%
|
|============================================================================================================================================= | 79%
|
|============================================================================================================================================== | 79%
|
|============================================================================================================================================== | 80%
|
|=============================================================================================================================================== | 80%
|
|=============================================================================================================================================== | 81%
|
|================================================================================================================================================ | 81%
|
|================================================================================================================================================= | 81%
|
|================================================================================================================================================= | 82%
|
|================================================================================================================================================== | 82%
|
|=================================================================================================================================================== | 82%
|
|=================================================================================================================================================== | 83%
|
|==================================================================================================================================================== | 83%
|
|===================================================================================================================================================== | 83%
|
|===================================================================================================================================================== | 84%
|
|====================================================================================================================================================== | 84%
|
|====================================================================================================================================================== | 85%
|
|======================================================================================================================================================= | 85%
|
|======================================================================================================================================================== | 85%
|
|======================================================================================================================================================== | 86%
|
|========================================================================================================================================================= | 86%
|
|========================================================================================================================================================== | 86%
|
|========================================================================================================================================================== | 87%
|
|=========================================================================================================================================================== | 87%
|
|============================================================================================================================================================ | 87%
|
|============================================================================================================================================================ | 88%
|
|============================================================================================================================================================= | 88%
|
|============================================================================================================================================================== | 88%
|
|============================================================================================================================================================== | 89%
|
|=============================================================================================================================================================== | 89%
|
|=============================================================================================================================================================== | 90%
|
|================================================================================================================================================================ | 90%
|
|================================================================================================================================================================= | 90%
|
|================================================================================================================================================================= | 91%
|
|================================================================================================================================================================== | 91%
|
|=================================================================================================================================================================== | 91%
|
|=================================================================================================================================================================== | 92%
|
|==================================================================================================================================================================== | 92%
|
|===================================================================================================================================================================== | 92%
|
|===================================================================================================================================================================== | 93%
|
|====================================================================================================================================================================== | 93%
|
|====================================================================================================================================================================== | 94%
|
|======================================================================================================================================================================= | 94%
|
|======================================================================================================================================================================== | 94%
|
|======================================================================================================================================================================== | 95%
|
|========================================================================================================================================================================= | 95%
|
|========================================================================================================================================================================== | 95%
|
|========================================================================================================================================================================== | 96%
|
|=========================================================================================================================================================================== | 96%
|
|============================================================================================================================================================================ | 96%
|
|============================================================================================================================================================================ | 97%
|
|============================================================================================================================================================================= | 97%
|
|============================================================================================================================================================================== | 97%
|
|============================================================================================================================================================================== | 98%
|
|=============================================================================================================================================================================== | 98%
|
|=============================================================================================================================================================================== | 99%
|
|================================================================================================================================================================================ | 99%
|
|================================================================================================================================================================================= | 99%
|
|================================================================================================================================================================================= | 100%
|
|==================================================================================================================================================================================| 100%
Centering and scaling data matrix
|
| | 0%
|
|========== | 5%
|
|==================== | 11%
|
|============================== | 16%
|
|======================================== | 21%
|
|================================================= | 26%
|
|=========================================================== | 32%
|
|===================================================================== | 37%
|
|=============================================================================== | 42%
|
|========================================================================================= | 47%
|
|=================================================================================================== | 53%
|
|============================================================================================================= | 58%
|
|======================================================================================================================= | 63%
|
|================================================================================================================================= | 68%
|
|=========================================================================================================================================== | 74%
|
|==================================================================================================================================================== | 79%
|
|============================================================================================================================================================== | 84%
|
|======================================================================================================================================================================== | 89%
|
|================================================================================================================================================================================== | 95%
|
|============================================================================================================================================================================================| 100%
save.image(file = paste0(projectName, '.RData'))
linear dimensional reduction. Default are based on VariableFeatures, but can be changed
seurat.object <- RunPCA(seurat.object, features = VariableFeatures(object = seurat.object))
PC_ 1
Positive: Car2, Car1, Blvrb, Klf1, Mt2, Vamp5, Ermap, Aqp1, C1qtnf12, Rhd
Sphk1, Ces2g, Tspo2, Cldn13, Gm15915, Slc38a5, Gstm5, Smim1, Abcb4, Gclm
Ybx3, Nxpe2, Ctse, Cenpv, Stard10, Mns1, Gata1, Slc25a21, Sdsl, Epor
Negative: Tmsb4x, Prtn3, Mpo, Ctsg, Tyrobp, Plac8, Elane, Clec12a, Ly6c2, Slpi
Sh3bgrl3, Anxa3, Pkm, Hp, Emb, Fcer1g, Ms4a3, Pgam1, Ms4a6c, Irf8
Coro1a, Serpinb1a, BC035044, Lgals1, H2afy, Arhgdib, Igsf6, Alas1, Ly86, Cst3
PC_ 2
Positive: Ctla2a, Malat1, Ifitm1, Hlf, Gimap6, Jund, Fos, Tsc22d1, Ltb, Gimap1
Tmem176b, Pim1, Adgrl4, Ifitm3, Sox4, Zfp36, Adgrg1, Cd34, Gcnt2, Rgs1
Klf2, Gm5111, Cd27, Ypel3, Klf6, Gm19590, Dusp1, Dusp2, Ifi203, Shisa5
Negative: H2afz, Ybx1, Ppia, Atp5g1, Ran, Mt1, Ranbp1, Cycs, Nme1, Hmgb2
Cks2, Slc25a5, Cox5a, Cks1b, Ap3s1, Atpif1, Dut, Eif5a, Ptma, Sdf2l1
Nhp2, Dynll1, Elane, Mrpl18, Atp5o, Ms4a3, Tmem14c, Chchd2, Ly6c2, Birc5
PC_ 3
Positive: Ube2c, Nusap1, Plbd1, Cenpf, Birc5, Lgals3, Aif1, Mki67, Id2, Hmmr
Top2a, Prc1, Ifi205, Cdca8, Kif23, Batf3, H2afx, Cenpa, Tpx2, Ccnb1
Ccnb2, Kif22, Cdc20, Plk1, Cenpe, Hmgb2, Pimreg, Racgap1, H2-Aa, Cdca3
Negative: Cd63, Srgn, Mif, Ung, Prtn3, Rgcc, Ms4a3, Srm, Gstm1, Elane
Nkg7, Hspd1, Cebpe, Alas1, Fkbp11, Hsp90ab1, Mpo, Ctsg, Prdx6, Trem3
Calr, Gsr, Serpinb1a, Rps2, Nme1, Prss57, Anxa3, Cst7, Fcgr3, Rack1
PC_ 4
Positive: Srgn, Nkg7, Itga2b, Cd9, Pf4, Ube2c, Lockd, Apoe, Cavin2, Nusap1
Cenpf, H1fx, Cks2, Serpine2, Cd63, Gata2, Hmmr, Pimreg, Cdc20, Tuba8
Ckap2l, Pbx1, Prc1, Cdca8, Ccnb2, Pdcd4, Rab27b, Tpx2, Malat1, Rgs18
Negative: Aif1, Ctss, Irf8, Cd74, Lsp1, Plbd1, Lgals3, H2-Aa, Cd52, Id2
Ifi205, H2-Eb1, Ccr2, Pld4, Batf3, Psap, Ighm, Mpeg1, H2-Ab1, Ckb
Ly86, Itgb7, Ms4a4c, Ifi30, Ctsh, Fth1, Naaa, Ms4a6c, Tmsb10, Jaml
PC_ 5
Positive: Ftl1, S100a8, Pglyrp1, Clec4a2, Rgcc, Gstm1, S100a6, Trem3, Gda, Wfdc21
Lgals3, Dstn, Slfn2, Prdx5, Hp, H2-Eb1, Ly6c2, H2-Aa, Cd74, Cebpe
Gng11, Cd63, Mcemp1, Fcgr3, Ms4a3, Mmrn1, Mt1, Hacd4, Pdzk1ip1, Selenom
Negative: Stmn1, H2afy, Ptma, Plac8, Hsp90ab1, Tmsb10, Rps2, Cd34, Bcl2, Hspa8
Sox4, Npm1, Hist1h2ap, Cpa3, Cd48, Tespa1, Satb1, Ccl9, Fos, Ppia
Hist1h2ae, Hist1h2ac, Hmgb1, Lat2, Fabp5, BC035044, Jun, Dntt, Adgrg3, Egr1
Plot results
VizDimLoadings(seurat.object, dims = 1:6, nfeatures = 10, reduction = "pca", ncol = 2)
DimPlot colored by orig.ident
DimPlot(seurat.object, reduction = "pca", group.by = "orig.ident")
Let’s put in a concerted effort to pick the right dimensionality using the newest software
# jackstraw.dim <- 40
# seurat.object <- JackStraw(seurat.object, num.replicate = 100, dims = jackstraw.dim) #runs ~50 min
# seurat.object <- ScoreJackStraw(seurat.object, dims = 1:jackstraw.dim)
# save.image(paste0(projectName, ".RData"))
Draw dim.reduction plots
# JackStrawPlot(seurat.object, dims = 25:36)
ElbowPlot(seurat.object, ndims = 50)
percent.variance(seurat.object@reductions$pca@stdev)
Number of PCs describing X% of variance
ElbowPlot(seurat.object, ndims = 50)
percent.variance(seurat.object@reductions$pca@stdev)
Exported cell IDs for clusters 3, 17, 10, 11 from Seurat v1. Will add these IDs as a metadata column.
Create column “clust.ID” and populate with 0’s. Then import IDs for clusters
tot.var <- percent.variance(seurat.object@reductions$pca@stdev, plot.var = FALSE, return.val = TRUE)
paste0("Num pcs for 80% variance:", length(which(cumsum(tot.var) <= 80)))
[1] "Num pcs for 80% variance:12"
paste0("Num pcs for 85% variance:", length(which(cumsum(tot.var) <= 85)))
[1] "Num pcs for 85% variance:18"
paste0("Num pcs for 90% variance:", length(which(cumsum(tot.var) <= 90)))
[1] "Num pcs for 90% variance:26"
paste0("Num pcs for 95% variance:", length(which(cumsum(tot.var) <= 95)))
[1] "Num pcs for 95% variance:37"
Add new metadata column and map new ids
clust3.cells <- read.table(file = "Seuratv1_clusterCellIDs/cluster3cellIDs.txt", col.names = "clust03")
clust3.cells <- sapply(clust3.cells, function(x) paste0(gsub("CMP", "CMPm2", x), "-1"))
clust17.cells <- read.table(file = "Seuratv1_clusterCellIDs/cluster17cellIDs.txt", col.names = "clust17")
clust17.cells <- sapply(clust17.cells, function(x) paste0(gsub("CMP", "CMPm2", x), "-1"))
clust10.cells <- read.table(file = "Seuratv1_clusterCellIDs/cluster10cellIDs.txt", col.names = "clust10")
clust10.cells <- sapply(clust10.cells, function(x) paste0(gsub("CMP", "CMPm2", x), "-1"))
clust11.cells <- read.table(file = "Seuratv1_clusterCellIDs/cluster11cellIDs.txt", col.names = "clust11")
clust11.cells <- sapply(clust11.cells, function(x) paste0(gsub("CMP", "CMPm2", x), "-1"))
do numbers make sense (we don’t expect the count to b exactly the same as the numbers in the original cluster)?
seurat.object@meta.data['clust.ID'] <- 0
seurat.object@meta.data$clust.ID[rownames(seurat.object@meta.data) %in% clust3.cells] <- 3
seurat.object@meta.data$clust.ID[rownames(seurat.object@meta.data) %in% clust17.cells] <- 17
seurat.object@meta.data$clust.ID[rownames(seurat.object@meta.data) %in% clust10.cells] <- 10
seurat.object@meta.data$clust.ID[rownames(seurat.object@meta.data) %in% clust11.cells] <- 11
make enough sense!
Let’s do some cluster analyses and see if we can find these populations in our new analysis. ## Ideal resolution…? (Is this a thing?) ### Color palette
nrow(seurat.object@meta.data[seurat.object@meta.data$clust.ID == 10,])
[1] 1049
nrow(seurat.object@meta.data[seurat.object@meta.data$clust.ID == 11,])
[1] 1118
nrow(seurat.object@meta.data[seurat.object@meta.data$clust.ID == 17,])
[1] 883
nrow(seurat.object@meta.data[seurat.object@meta.data$clust.ID == 3,])
[1] 1931
set total.var <- 90%
color.palette <- c(
"coral",
"chartreuse4",
"goldenrod1",
"cadetblue1",
"burlywood",
"brown",
"brown1",
"blue",
"blue4",
"azure3",
"aquamarine",
"antiquewhite",
"cadetblue",
"gold3",
"black",
"darkgreen",
"deeppink",
"darkviolet",
"darkturquoise",
"darkslategray",
"darksalmon",
"darkorchid1",
"darkolivegreen2",
"forestgreen",
"dodgerblue",
"green",
"lightpink",
"lightcoral",
"khaki1",
"maroon",
"peru",
"lightseagreen",
"lightsalmon",
"plum",
"moccasin",
"tan",
"tan1",
"red",
"purple",
"khaki4",
"black",
"plum4"
)
Plot UMAP
tot.var <- percent.variance(seurat.object@reductions$pca@stdev, plot.var = FALSE, return.val = TRUE)
ndims <- length(which(cumsum(tot.var) <= 90))
print(ndims)
[1] 26
seurat.object <- FindNeighbors(seurat.object, dims = 1:ndims)
Computing nearest neighbor graph
Computing SNN
seurat.object <- FindClusters(seurat.object, resolution = 0.5)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 37104
Number of edges: 1255141
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8962
Number of communities: 13
Elapsed time: 12 seconds
seurat.object <- RunUMAP(seurat.object, dims = 1: ndims)
19:12:03 UMAP embedding parameters a = 0.9922 b = 1.112
19:12:03 Read 37104 rows and found 26 numeric columns
19:12:03 Using Annoy for neighbor search, n_neighbors = 30
19:12:03 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
19:12:08 Writing NN index file to temp file /var/folders/4f/fwrj6fnn1dn4g8wsf0zv563hjsvl24/T//RtmptQx8Ib/filea01940da1192
19:12:08 Searching Annoy index using 1 thread, search_k = 3000
19:12:23 Annoy recall = 100%
19:12:24 Commencing smooth kNN distance calibration using 1 thread
19:12:26 Initializing from normalized Laplacian + noise
19:12:28 Commencing optimization for 200 epochs, with 1544862 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
19:12:55 Optimization finished
# saveRDS(seurat.object, file = paste0(projectName, "_dim", ndims, ".RDS"))
for(x in c(0.5, 1, 1.5, 2, 2.5)){
seurat.object <- FindClusters(seurat.object, resolution = x)
}
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 37104
Number of edges: 1255141
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8962
Number of communities: 13
Elapsed time: 12 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 37104
Number of edges: 1255141
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8602
Number of communities: 23
Elapsed time: 9 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 37104
Number of edges: 1255141
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8355
Number of communities: 30
Elapsed time: 9 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 37104
Number of edges: 1255141
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8156
Number of communities: 36
Elapsed time: 10 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 37104
Number of edges: 1255141
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8006
Number of communities: 42
Elapsed time: 10 seconds
Generate statistics for each cluster/resolution combo
for (meta.col in colnames(seurat.object@meta.data)){
if(grepl(pattern = ("RNA_snn_res"), x = meta.col)==TRUE){
myplot <- DimPlot(seurat.object,
group.by = meta.col,
reduction = "umap",
cols = color.palette
) +
ggtitle(paste0(projectName, " dim", ndims, "res", gsub("RNA_snn_res", "", meta.col) ))
plot(myplot)
}
}
For each resolution, what percentage of cells in each cluster are enriched for one of our clust.IDs?
Test: what percentage of each new clusterID matches one of the older clusters?
current_res <- 'RNA_snn_res.0.5'
cluster_ids <- sort(unique(seurat.object@meta.data[,current_res]))
counts_df <- data.frame(matrix(nrow = length(cluster_ids), ncol = 4))
rownames(counts_df) <- cluster_ids
colnames(counts_df) <- c("LSKm2", "CMPm2", "MEPm", "GMPm")
for(id in cluster_ids){
cell_value <- nrow(seurat.object@meta.data[(seurat.object@meta.data[current_res] == id) &
(seurat.object@meta.data$orig.ident == "LSKm2"),])
counts_df[id, "LSKm2"] = cell_value
cell_value <- nrow(seurat.object@meta.data[(seurat.object@meta.data[current_res] == id) &
(seurat.object@meta.data$orig.ident == "LSKm2"),])
counts_df[id, "CMPm2"] = cell_value
cell_value <- nrow(seurat.object@meta.data[(seurat.object@meta.data[current_res] == id) &
(seurat.object@meta.data$orig.ident == "MEPm"),])
counts_df[id, "MEPm"] = cell_value
cell_value <- nrow(seurat.object@meta.data[(seurat.object@meta.data[current_res] == id) &
(seurat.object@meta.data$orig.ident == "GMPm"),])
counts_df[id, "GMPm"] = cell_value
}
Absolutely terrible overlap, no enrichment of any of these across the new clustering algorithm. Maybe should try 95% variation covered
time for the super scarey moment to see if the cells from seuratv1 still cluster together on in seurat v4
for (meta.col in colnames(seurat.object@meta.data)){
if(grepl(pattern = ("RNA_snn_res"), x = meta.col)==TRUE){
new.clusters <- sort(as.numeric(levels(seurat.object@meta.data[[meta.col]])))
enrich.df <- data.frame(matrix(ncol = 4, nrow = length(new.clusters)))
colnames(enrich.df) <- c(3, 17, 10, 11)
rownames(enrich.df) <- new.clusters
meta.df <- seurat.object@meta.data
for(row.id in rownames(enrich.df)){
tot.clus <- nrow(meta.df[meta.df[[meta.col]] == row.id,])
for(col.id in colnames(enrich.df)){
num.x <- nrow(meta.df[(meta.df[[meta.col]] == row.id) & (meta.df$clust.ID == col.id),])
pct.x <- as.integer(num.x / tot.clus *100)
# print(pct.x)
enrich.df[row.id, col.id] <- pct.x
}
}
colnames(enrich.df) <- sapply(colnames(enrich.df), function(x) paste0("oldcluster", x))
rownames(enrich.df) <- sapply(rownames(enrich.df), function(x) paste0("newcluster", x))
xlsx::write.xlsx(enrich.df, file = paste0("PctOfNewClustersOverlappingOldClusters_", projectName, "_dim", ndims, ".xlsx"), sheetName = paste0(gsub("RNA_snn_", "", meta.col)), append = TRUE)
print(enrich.df)
}
}
Error in .jcall(wb, "Lorg/apache/poi/ss/usermodel/Sheet;", "createSheet", :
java.lang.IllegalArgumentException: The workbook already contains a sheet of this name
DimPlot(seurat.object,
reduction = "umap",
group.by = "clust.ID",
# split.by = "orig.ident",
cols = c("gray", "orange", "blue", "red", "green"),)
set total.var <- 95%
DimPlot(seurat.object,
reduction = "umap",
group.by = "orig.ident",
split.by = "clust.ID",
cols = c("gray", "orange", "blue", "red", "green"),)
Plot UMAP
tot.var <- percent.variance(seurat.object@reductions$pca@stdev, plot.var = FALSE, return.val = TRUE)
ndims <- length(which(cumsum(tot.var) <= 95))
print(ndims)
[1] 37
seurat.object <- FindNeighbors(seurat.object, dims = 1:ndims)
Computing nearest neighbor graph
For each resolution, what percentage of cells in each cluster are enriched for one of our clust.IDs?
Test: what percentage of each new clusterID matches one of the older clusters?
Absolutely terrible overlap, no enrichment of any of these across the new clustering algorithm. Maybe should try 95% variation covered
time for the super scarey moment to see if the cells from seuratv1 still cluster together on in seurat v4
Let’s see if we can get some gene expression profiles on these…
Must ensure we have the right cluster stability, that is, cells that start in the same cluster tend to stay in the same cluster. If your data is over-clustered, cells will bounce between groups.
Following [this tutorial by Matt O.].https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92.
Previously my favourite has been Clustree, which gives a nice visual NB: For some reason clustree::clustree() didn’t work, whereas library(clustree) followed by clustree() did.
These data suggest that node stability is aweful! Need to figure out if this is a dimensional reduction error or a clustering error.
Differences could include: * cells in each population (cellranger v6 includes more cells than cellranger v1, especially in MEP) * dimensionality is incorrect * ScaleData didnt account for regression factors (e.g., “nCounts_RNA” or “nFeatures_RNA”) * Did not consider cell cycle * Incorrect normalization/scaling method * Clustering is too strict or not strict enough * neighborhood analysis used wrong parameters * Should include mitoC filter (there’s a chunk of MEP w/ mitoC @ ~40%) * SCTransform accounts better for sources of variability
Looks like MEPm is the only sample with that huge MitoC % lump @ 40%. What do these cells look like, otherwise?
Save dim36 as is and try clustering analysis @ dim24
One possibility is that I’ve included too many dimensions. Will see if 90% increases stability.
Save object
Must ensure we have the right cluster stability, that is, cells that start in the same cluster tend to stay in the same cluster. If your data is over-clustered, cells will bounce between groups.
Following [this tutorial by Matt O.].https://towardsdatascience.com/10-tips-for-choosing-the-optimal-number-of-clusters-277e93d72d92.
Previously my favourite has been Clustree, which gives a nice visual NB: For some reason clustree::clustree() didn’t work, whereas library(clustree) followed by clustree() did.
Think I’ll explore regression factors using SCTransform in new document.